Multi-armed bandit
REINFORCEMENT LEARNING PROBLEM EXEMPLIFYING THE EXPLORATION–EXPLOITATION TRADEOFF
Bandit problem; Epsilon-greedy strategy; E-greedy strategy; Bandit model; Multi armed bandit; K-armed bandit; K armed bandit; N armed bandit; N-armed bandit; Multi-armed bandit problem; Two-armed bandit; Two armed bandit; Multi–armed bandit; Contextual bandit algorithm; Bandit process; Multi-armed bandits; Multiarmed bandit; Bandit (machine learning); Adversarial bandit; Approximate solutions of the multi-armed bandit problem; Collaborative bandit; Multi-arm bandit; Ε-greedy exploration
In probability theory and machine learning, the multi-armed bandit problem (sometimes called the K- or N-armed bandit problem) is a problem in which a fixed limited set of resources must be allocated between competing (alternative) choices in a way that maximizes their expected gain, when each choice's properties are only partially known at the time of allocation, and may become better understood as time passes or by allocating resources to the choice. This is a classic reinforcement learning problem that exemplifies the exploration–exploitation tradeoff dilemma.